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Abstract — We propose a dynamic programming algorithm for 
projection onto wavelet tree structures. In contrast to other 
recently proposed algorithms which only give approximate tree 
projections for a given sparsity, our algorithm is guaranteed to 
calculate the projection exactly. We also prove that our algorithm 
has 0{Nk) complexity, where TV is the signal dimension and k 
is the sparsity of the tree approximation. 

Index Tenns — Sparse representations, wavelets, dynamic pro- 
gramming, complexity analysis, compressed sensing. 
EDICS categories: DSP-SPARSE, MLSAS-SPARSE. 



I. Introduction 

The discrete wavelet transform is a much-used tool in 
digital signal processing, especially in image processing, since 
it provides sparse representations for piecewise-smooth sig- 
nals yj. Wavelets have a multi-scale structure in which a 
signal is convolved with dilations and translations of some 
'mother wavelet', which induces a natural dyadic tree structure 
upon the wavelet coefficients. Since wavelets essentially detect 
discontinuities, large wavelet coefficients of piecewise smooth 
signals are often propagated down branches of the tree. This 
suggests that sparse approximations of such signals have 
additional structure: they are tree-sparse, by which we mean 
that they are supported on a rooted subtree ||2l, see Section [HI 
This paper concerns the task of finding an exact tree projection, 
namely a tree-sparse vector of a given sparsity which is closest 
in Euclidean norm to some given signal vector 

Tree projections, or approximations thereof, predate the 
arrival of wavelets. An early example of their computation 
is found in the pruning of decision tree structures known as 
classification and regression trees (CART) 131, f?!, (5\. With 
the emergence of wavelets, tree approximations also began to 
be used in the context of wavelet-based compression Q, Q 
and orthogonal best-basis methods fS), ||5], Q. More recently, 
algorithms for tree-based Compressed Sensing have been pro- 
posed which require the computation of tree projections ifTOl . 
llll. Furthermore, recovery guarantees have been obtained for 
these algorithms which depend crucially on the assumption 
that an exact tree projection is computed in every iteration of 
the algorithm |10|, [2J. 

Several algorithms have been proposed for wavelet tree 
projection, and the summary by Baraniuk in ifTTI describes 
three possible approaches: Greedy Tree Approximation (GTA), 
the Condensing Sort and Select Algorithm (CSSA), and 
the Complexity-Penalized Sum of Squares (CPSS). GTA 171 
proceeds by growing the tree from the root node in a 
'greedy' fashion by making a series of locally optimal choices. 
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However, such an approach is only guaranteed to find an 
exact tree projection in the idealized case in which the 
wavelet coefficients are monotonically nonincreasing along 
the branches fTTl. The CSSA was originally proposed by 
Baraniuk and Jones [12J in the context of optimal kernel 
design, and solves a linear programming (LP) relaxation of 
the exact tree projection problem. The CPSS approach has 
its roots in the minimal cost-complexity pruning algorithm for 
CART O, and was later developed by Donoho in the context 
of wavelet approximations JSj, 0. This algorithm solves a 
Lagrangian relaxation of the exact tree projection problem, 
in which the sparsity constraint is removed and penalized in 
the objective function instead. We argue in Section III] that, 
since both the CSSA and CPSS solve relaxations, neither 
algorithm is guaranteed to find an exact tree projection for 
a given sparsity. Since the theoretical results mentioned above 
require an exact tree projection, a crucial question is whether 
it is possible to guarantee the computation of an exact tree 
projection in polynomial time. 

In this paper, we propose an algorithm for exact tree pro- 
jection in the context of wavelets. We make use of a Dynamic 
Programming (DP) approach which has been used before in 
other contexts. An early reference is Breiman et al. PI, where 
it is proved that such an algorithm exists in the context of 
CART decision trees. The algorithm, referred to as optimal tree 
pruning, was then formally stated by Bohanec and Bratko |4], 
who also proved that it has complexity 0{N'^), where N is 
the signal dimension. The same exact tree projection approach 
was also used in the context of orthogonal best-basis selection 
in |8|. Our algorithm adopts the same basic approach as in Q, 
but this time in the context of wavelet approximation; further 



distinctions to |4| are given in Section III We also prove that 



our algorithm has low-order polynomial complexity, namely 
0{Nk), where k is the sparsity of the tree approximation, 
which represents an improvement over the result in [4|. 

Some closely-related recent work lfT3l proves that there 
exists a polynomial-time DP algorithm for projection onto 
any tree structure which has a loopless pairwise overlapping 
group property. We should point out, however, that wavelet 
transforms may not exhibit this property. 

II. Further motivation and preliminaries 

We frame the discussion in terms of standard Z3-dimensional 
Cartesian-product dyadic wavelets, which have a particular 
canonical 2^-ary tree structure. We assume a tree structure 
in which each coefficient has a maximum of d children, 
where d is the tree order. The Cartesian-product structure 
implies that d = 2^, where the transform dimension D 
is some fixed positive integer, and so d is a fixed integer 
with d > 2. Also, let N — d' for some J > 2, where 



J is the number of levels in the tree structure. We assign 
a numbering {i '■ 1 < i < N} to the nodes of the tree, 
starting with the root node and proceeding from coarse to 
fine levels. Let the root node i = 1 be at level 0, and have 
d— \ children at level 1, namely {2, . . . , d}. Let each of the 
remaining nodes except those in level J (the finest level), that 
is {i : 2 < i < d'^^^ — N/d}, each have d children, namely 
{d{i — 1) + 1, ... , di} respectively. The nodes in the finest 
level J, that is {i : i > d'^^^ — N/d}, are assumed to have 
no children; see Figure [T] 




Fig. 1. An illustration of the canonical dyadic tree structure considered here, 
for the case of d = 2 (binary tree) and J = 4. 



Following [2], we define a rooted tree of cardinality k to 
be some subtree consisting of k nodes of the full tree just 
described, subject to the restrictions that it must include the 
root node, and that if any node other than the root node is 
included then its parent node must also be included. We write 
Tfe for the set of supports of all rooted trees of cardinality 
k. Given a vector y e M^, we are interested in finding the 
Euclidean projection Vk{y) of a vector y £ M^ onto the set 
of vectors with support in 71-, namely 



Vk[v) 
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(1) 



where supp(z) denotes the support of z. Provided the above 
numbering scheme is used to distinguish between supports 
with precisely the same approximation error, Vk{y) is well- 
defined. Intuitively, the following lemma establishes that a tree 
projection preserves the values of selected coefficients. 
Lemma 1: Let Vki-) be defined as in (IT]). Then 



{'Pk{y)},-^{ q' *^p where L := argmax Hy^l' 



oeTfc 



(2) 



Proof: Let supp(z) = F G Tk- Then 

\\z~yr^\\{z~y)r\\' + \\yvc\;\ 

and 11(0 — y)r|| is minimized by setting Zi = yi for all z G F. 
Meanwhile, Hyr^^ll is minimized by choosing F to be the set 
Vt £Tk which maximizes ||ysi||, and the result now follows. D 
Lemma [T] implies that tree projection is a decision problem: 
since coefficient values are preserved, the projection is ob- 
tained by identifying the set F G 7fe on which y has maximum 
energy. It follows that Vkiy) may also be found by solving 



the following Integer Program (IP): 

r >0 

{tj} tree-nonincreasing 

Tl = l. 



N 



max } y^Ti 



-1 



subject to 



(3) 



Here tree-nonincreasing means that the coefficients do not in- 
crease along the branches, a condition which may be translated 
into a series of linear constraints. Note that the assumption 
of integrality, together with the constraints, in fact forces 
Ti G {0, 1} for each i, so that r acts as a decision variable. 
Denoting the optimal solution of ([3]l by r*, the coefficients of 
the best tree approximation are then ytT* . 

We can now be more precise about our claim in Section |l] 
that both the CSSA O and the CPSS fS) solve relaxations 
of the tree projection problem, and hence are only guaranteed 
to give approximate tree projections. 

The CSSA solves an LP relaxation of ^ in which one 
dispenses with the assumption of integrality while retaining 
all other constraints. The authors of I.12J observe that the two 
solutions are sometimes equal, but not always: a value of r; = 
1 may be assigned to all but a few coefficients, which are each 
assigned a value f for some < f < 1. In this case, the result 
is a tree of size strictly greater than k. Moreover, there is no 
straightforward method for 'adjusting' the solution a posteriori 
to obtain the optimal fc-sparse tree projection. 

Meanwhile, the CPSS IJS] solves a Lagrangian relaxation 
of (l3| in which the equality constraint ^ t^ = fc is removed 
and penalized in the objective instead. This reformulation can 
be solved using fine-to-coarse dynamic programming on the 
tree |5|, |9|. Because ^ is an IP, its Lagrangian relaxation 
is also not guaranteed to share the same optimal solution 
as ^ mi. A further drawback of the CPSS is the non- 
obvious relationship between the Lagrange multiplier A and 
the required sparsity, which means that the problem must in 
practice be solved for multiple values of A in order to find a 
tree approximation for a given sparsity. 

III. The FTP ALGORITHM 

Our algorithm (Algorithm 1) falls into the broad category 
of dynamic programming (DP) algorithms which optimize on 
directed graphs by utilizing a principle of optimality, namely 
that optimal solutions at a given node may be determined 
entirely from optimal solutions at 'preceding' nodes |15|. Our 
algorithm makes two passes through the tree: firstly, a fine-to- 
coarse pass finds optimal subtrees at each node for all k < k. 
Secondly, a coarse-to-fine pass tracks back to identify the 
optimal solution. 

The algorithm starts at the finest level and moves up the tree, 
finding optimal subtrees rooted at each node, each decision 
requiring only the information already obtained within that 
particular subtree. Following [41, optimal subtrees for a given 
node are obtained by successively incorporating each of its 
children, and repeatedly optimizing over all known solutions. 
To maximize efficiency, optimal subtrees are in fact only found 
for all cardinalities which could possibly contribute to a rooted 
tree of cardinality fc, which imposes two restrictions. Firstly, 
by summing the appropriate geometric series, the maximum 



cardinality of a subtree rooted at a node in level j is ^—jzrr~^- 
Secondly, any subtree rooted at a node in level j of cardinality 
greater than k—j would necessarily contribute to a rooted tree 
of cardinality greater than k. Defining 

l{j):^mini—j-^—,k-jj 1 < j < J, (4) 

we therefore need only find optimal subtrees for cardinalities 
up to l{j) for nodes at level j. Further restrictions are imposed 
due to the fact that children are incorporated sequentially. 

F{i,l) denotes the total energy of the optimal subtree 
rooted at node i, of size I, and G{i,l) — [Gr{i,l) '■ r — 
1, . . . , d] e M'' is a vector of nonnegative integers, giving the 
number of nodes contributing to this optimal solution from 
the subtree rooted at each of the children of node i. We 
then proceed recursively and eventually determine F(l.fc), 
the energy of the optimal tree of size /cQ Finally, we use 
the precedence information to trace the optimal solution back 
along the branches. The algorithm actually does more than is 
asked for: by the time the root node is reached at the end of the 
first pass, enough information has been obtained to determine 
optimal trees for all k < k. 

FTP is closely related, but not identical, to the optimal 
tree pruning algorithm in |4|, where the decision tree context 
motivates the solution of a slightly different problem from 
the present tree projection problem. In their setup, if a node 
is pruned, then so are all other nodes sharing the same parent 
in the tree structure. Also, an optimal pruning is obtained for 
a given number of leaves (ends of branches of the tree), as 
opposed to the cardinality of the tree approximant. There is 
also a difference in how information is stored: the proposal 
in m is to store the full optimal subtrees themselves at each 
node, essentially incorporating the backtracking phase into 
the first pass of the algorithm. 

Algorithm 1. Exact tree projection (ETP). 

f d e N (d > 2) 
Inputs: < yeR^ (N ^d-'; J>2) 

[ k£N i2< k<N) 
Find all optimal subtrees: 

for i = {d^-^ + 1) ■.d'' do 

F{i, 0) = and F{i, 1) = yf 
end for 

for j = (J-1) : -1 : 1 do 
for i = (d^-i + 1) : d^ do 
F{i,0) =0 andF(i,l) ^ yf 
G(i,0) = and G{i,l) ^0 
for r = I : d do 

{or I = 2: mm[l{j),rl{j + 1) + 1] do 
s_ = max{l, / - [(r - l)l{j + 1) + 1]} 
s+=min[;-l,/(j + l)] 
s = argmax {F[d{i — 1) + r, s] + F{i, I — s)} 

F{i, I) = F[d{i - 1) + r, s] + F{i, I - s) 
G(i,/) = G(i,/-J), Gr{i,l)^s 
end for 

'We also need to define two temporary variables F and G which represent 
updates to F and G during the incoiporation of a child. 



for / = 2 : mm[l{j),rl{j + 1) + 1] do 

F{i,l) ^ F{i,l) and G{i,l) = G{i,l) 
end for 
end for 
end for 
end for 
for r = 2 : d do 

for / = 2 : min[A:, (r - 1)^(1) + 1] do 
s_ = max{l, l-[{r- 2)l{l) + 1]} 
s+ = niin[/ — 1, ^(1)] 
s = argmax [F{r^ s) + i^(l, / — s)] 

F{l,l)^F{r,s)+F{l,l-s) 
G{1,1) =G{l,l-s), Gr{l,l) = s 

end for 

for Z = 2 : m[n[k, {r - l)l{l) + 1] do 
F{1, 1) = F(l, and G(l, I) = G(l, 

end for 
end for 

Backtrack to identify solution: 

r = 0, ri = 1, Ti = fc 
for j == : ( J - 1) do 

for i = max[2, (d^-i + 1)] : d^ do 
if Ti = 1 then 

for r — max(l, 2 — j) : d do 
ifGrii,Ti} > then 

Td{i-l)+r = 1 
^d(i-l)+r = Gr{i,Ti) 

end if 
end for 
end if 
end for 
end for 

Calculate solution: 
for i = 1 : A^ do 

end for 



Output: y e 



t,N 



IV. Complexity analysis 



To establish the complexity of FTP, we note that the domi- 
nating operations are additions and comparisons, and therefore 
it suffices to bound the total number of these operations. 

Theorem 2: Given y £ M.^ , the FTP algorithm requires at 
most {3d'^Nk + N) additions/comparisons to calculate Vk{y), 
defined in ([T]). 

Proof: Consider the first pass of FTP. In each level j 
with 1 < j < J — 1, there are (d — 1) • d^^^ nodes. 
For a given node i at level j, each of its d children are 
incorporated successively. Fach time a child is incorporated, 
we calculate optimal subtrees for each I such that 2 < I < 
Imax- Each of these calculations requires at most (/ — 1) 
additions, each one accompanied by a comparison, and we 
have Irnax < ^j), where l{j) is defined in (J4|. The number 
of additions/comparisons needed to incorporate a child is 



therefore bounded by 



Finally, note that the second pass involves no additions and at 



2j2ii-i)<i{jmj)-i]<[iij)]' 



(5) 



1=2 



Combining all these observations, and writing C for the total 
number of additions/comparisons in the first pass of the 
algorithm for levels I < j < J — 1 (i.e. excluding the root 
node), we have 



j-i 



c<j2{id~i)d^-'-d-[iij)f} 



i=i 



To bound l{j), observe from (J4| that 



(6) 



(7) 



To determine which of (i'+^ ^ or k gives a tighter bound, let 
1<P<J— Ihe the unique positive integer such that 



dP-^ <k<dP. 
First let us assume p > 2. We have 
k < d-'+^-i ■i=^dP < d'^+^-^ <= 



(8) 



j <J + 1~P, 



which we may combine with (|6| and (J7]i to deduce 

J+l-p ,7-1 

C< ^ {d-l)d^-k^+ J2 {d-l)d'-d^^-'+^~''> (9) 

j=l j=J+2-p 



J+l-p 

= (d - 1) <j dk^ Y, ^^'^ 





and since (i'^+^ p — 1 < rf-'+i p, we can further deduce from 
^ that 

c<e- d'+^-p + d'+^+p, 

to which we can make the substitution N = d^ and apply the 
bounds ([8]), concluding that 



C<k^ ■ 



d^N 



+ d^Nk = 2d^Nk. 



(11) 



On the other hand, if p < 2, then k < d^ < d'^+^-3 for all j < 
J — 1, and so the second summation in (|9]l is empty. We may 
then follow the same argument to bound the first summation, 
obtaining C < 2d'^Nk in this case also. Meanwhile, the root 
node has (rf — 1) children, which may be combined with (jsll 
and (jTll to deduce that the number of additions/comparisons 
for the root node is bounded by 



{d-l)k^ <d^Nk. 



most N comparisons, which combines with (111 and (12i to 
yield the result. D 

It follows from Theorem [2] that FTP has complexity 
©(A^/c)!^ We may compare the complexity of FTP with that 
of the other approximate tree projection methods discussed in 
Section |n] GTA Q and the CSSA ^ are both e'(iVlogiV) 
while the CPSS fSl is 0{N). Provided logiV < fc, we can see 
that the price we pay for guaranteeing an exact tree projection 
is an increased order of complexity. 

The algorithm is easy to implement and we have ex- 
perimented with random tests successfully; its cost is not 
disappointing compared to approximate methods such as the 
CSSA and the CPSSE] 

V. Conclusion 

We have presented an algorithm for exact tree projection 
in the context of wavelets and have proved that it has 0{Nk) 
worst-case complexity. The DP algorithm described here could 
also be applied to any tree-based signal representations. Ap- 
proximate algorithms may still offer an advantage in terms 
of computational efficiency, and therefore the practitioner's 
choice of tree projection algorithm will depend upon whether 
guaranteed precision or algorithm running time is the overrid- 
ing priority. 
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